J221 Introduction to visualizing data: Final Project

By Nadia Lathan

Hey, Where’s my Bike?

Today I’ll be using data I obtained from Berkeley PD this past summer of bicycle thefts in the city from January 2019 to July 2023.

I became interested in this topic after briefly reporting on a story about a woman whose electric bicycle, her family’s sole form of transportation, was stolen and how that affected her desire to continue using it in place of owning a car.

I spoke to Ahmed El-Geneidy, an urban planning professor at McGill University who studies bicycle theft, and I learned that bike theft tells just a tiny slice of the story behind why or why not people choose to cycle.

Feeling safe on the road with protected bike lanes is always what’s most important to people. However, the expectation that bicycle ownership means getting it stolen at one point or another can play a small role in travel decision-making. Bike theft is under reported because to police departments across the country because riders feel, often accurately, that officials don’t or simply have bigger fish to fry. This, in turn, can reinforce theft where people know little to nothing will be done about it.

But there are numerous measures cities can implement to discourage theft or make it difficult, especially in areas such as Berkeley, Oakland, and San Francisco where it’s a popular way to get around.Some examples include providing more bike racks and other kinds of designated bike parking, like lockers or stations. Public service campaigns teaching people how to properly lock their bock (with a U-lock and to the frame) could also do wonders.

So although there’s always some chance a bike can get stolen once it’s locked, especially during the summer months where demand for bikes is highest, it’s not inevitable and there are ways to mitigate it.

This made me interested in examining what trends exist in Berkeley as one of the country’s most bikeable cities to see where, when, and how frequently they’re stolen.

Here is the link to the data.

Interrogating the Data

  1. What is the source of the data? Is it a primary source? If it is a secondary source? What could you do to verify that it is accurate?

The source of the data is Berkeley Police Department. It is a primary source. To verify the accuracy of the information, I would contact the police department.

  1. For what time period is the data?

January 2019 to July 2023. This is the time period I requested from BPD and checked using the filter function on the “Case Reported” column in Google Sheets.

  1. How many records are in the database?

There are 1,649 records. I found this by checking the row count of the “Case Number” column. I also double-checked this using the .info method in Python.

  1. Are there any duplicates?

No, there are no duplicates. I checked this by creating a pivot table in Google Sheets and set “Case Number” as the row and values by count. I also double-checked again using the .duplicated() method in Python.

  1. Are there any consistency issues? (Things spelled different ways, etc…)

Yes, some case addresses don’t have a numerical value which I assume represent intersections.

  1. Are numeric fields within valid ranges?

For the most part. Some case addresses don’t contain numbers and just have two street names listed.

  1. Is there missing data?

No, there is no missing data. I checked by scrolling through all 1,600 rows when exporting it from tabula, uploading it to Google Sheets, and exporting it to an xlsx file.

  1. What questions do you have about the data that you will need to speak with someone about?

I want to confirm whether entries that contain two roadways represent an intersection.

Findings & Visualizations

Bike theft across median income in Berkeley

alameda <- get_acs(geography = "tract", 
              variables = "B06011_001",
              state = "CA",
              county="Alameda",
              year = 2021)
## Getting data from the 2017-2021 5-year ACS
view(alameda)

alameda_trt <- tracts("CA", county = 'Alameda', progress_bar = FALSE)
## Retrieving data for the year 2021
qtm(alameda_trt)

glimpse(alameda_trt)
## Rows: 379
## Columns: 13
## $ STATEFP  <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ COUNTYFP <chr> "001", "001", "001", "001", "001", "001", "001", "001", "001"…
## $ TRACTCE  <chr> "428301", "428302", "428400", "430900", "431000", "432100", "…
## $ GEOID    <chr> "06001428301", "06001428302", "06001428400", "06001430900", "…
## $ NAME     <chr> "4283.01", "4283.02", "4284", "4309", "4310", "4321", "4280",…
## $ NAMELSAD <chr> "Census Tract 4283.01", "Census Tract 4283.02", "Census Tract…
## $ MTFCC    <chr> "G5020", "G5020", "G5020", "G5020", "G5020", "G5020", "G5020"…
## $ FUNCSTAT <chr> "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "…
## $ ALAND    <dbl> 4300902, 2231410, 810521, 1096550, 954210, 1105772, 525761, 6…
## $ AWATER   <dbl> 1597298, 474770, 1168052, 0, 0, 0, 0, 383522, 1080420, 9638, …
## $ INTPTLAT <chr> "+37.7344929", "+37.7439709", "+37.7564138", "+37.6982784", "…
## $ INTPTLON <chr> "-122.2414385", "-122.2497790", "-122.2559068", "-122.0828322…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-122.261 37..., MULTIPOLYGON (((…
alameda_map <- merge(alameda_trt, alameda, by.x = "GEOID", by.y = "GEOID")
qtm(alameda_map)

qtm(alameda_map, "estimate")

alameda_popup <- paste0("<b>TRACT: ", alameda_map$NAMELSAD, "</b><br />Income ", format_dollars(alameda_map$estimate))

alameda_palette <- colorNumeric(palette = "Greens", domain=alameda_map$estimate)

leaflet(alameda_map) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(stroke=FALSE, 
              smoothFactor = 0.2, 
              fillOpacity = .8, 
              popup=alameda_popup, color= ~alameda_palette(alameda_map$estimate))
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
addresses <- read.csv("/Users/nadia/Desktop/R/addresses_geocodio.csv")

sf_data <- st_as_sf(addresses, coords = c("Longitude", "Latitude"), crs = 4326)

geojson_write(sf_data, "addresses.geojson")
## Success! File is at myfile.geojson
## <geojson-file>
##   Path:       myfile.geojson
##   From class: geo_list
geojson_data <- st_read("myfile.geojson")
## Reading layer `myfile' from data source `/Users/nadia/Downloads/myfile.geojson' using driver `GeoJSON'
## Simple feature collection with 837 features and 13 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -122.3187 ymin: 0 xmax: 0 ymax: 38.04138
## Geodetic CRS:  WGS 84
alameda_map_wgs84 <- st_transform(alameda_map, 4326)

bike_income_map <- leaflet(alameda_map_wgs84) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(
    fillOpacity = 0.8, 
    popup = alameda_popup, 
    color = ~alameda_palette(alameda_map$estimate)
  ) %>%
  addCircles(
    data = geojson_data,
    fillOpacity = 0.8, 
    popup = ~Case.Address,
    color = "black",
    radius = 5
  )
  
bike_income_map

Here is a map of reported bike theft incidents in Berkeley from January 2019 to July 2023.

Overall, the distribution of theft is relatively evenly dispersed across income but is most concentrated around the downtown area and next to campus. Just from a visual standpoint, it doesn’t look like there is a strong relationship between income and prevalence of bike theft.

2019 experienced the highest amount of thefts while 2021 had a low of 331 bikes stolen that year. This can likely be attributed to it being the first full year of the pandemic. The median number of bikes stolen is 362 every year.

According to El-Geneidy and other researchers, more bikes are stolen in the summer months than during any other time of the year. This is the case because there’s more cyclists and a greater demand to cycle when it’s warm outside. We can see this trend holds true for Berkeley where in May the largest number of bikes, 40 total, were stolen. A similar spike takes places again in August where demand piques again as the school semester starts over. On average, there were 28 bikes stolen every month in 2022.